June 27, 2017

WAS R Git

Who are we?

Robert Schlegel

My name is rob and I love fish and kelp

Morgan Brand

What was there?

The relevant not so distant past

  • R
  • Latex
  • Git

What is there?

The present and the future

  • Rstudio
  • R Markdown
    • Bookdown
    • Thesisdown
    • Blogdown
  • GitHub

Our roles as 'modern' scientists are?

  • Communication
    • Twitter
    • Popular articles
    • Public speaking
  • Interdiciplinary work
  • Collaboration
    • Beyond your lab

A tradional approach to the scientific method

  1. Devise a fancy question and call it a hypothesis

  2. Formulate a means of collecting the relevant data

  3. Import data set into statistical software package

  4. Run the procedure to get results

  5. Copy and paste appropriate pieces from the analysis into document editor

  6. Add descriptions

  7. Finish/submit report for comments

    REPEAT steps 2 - 7 after receiving comments indefinately..

Disadvantages ot this process

  • The process of data capture is not open

  • Lots of manual work (prone to make errors)

  • Tedious (who likes to carefully copy-and-paste?)

  • Likely not recordable (did you write down all the steps you followed to get your analysis?)

  • What if you made an error at the beginning of your analysis? If your data had an error? If your hypothesis was biased?

Why R?

  • R is a free software package for statistical analysis and graphics.

  • It excels in helping you with:
    • data manipulation
    • automation
    • reproducibility
    • improved accuracy
    • error finding
    • customizability
    • beautiful visualizations
    • Any downsides?

R vs. RStudio

R is an older version that favours the command line programmer

RStudio is a powerful user interface that helps you get better control of your analysis.

  • Like R, it is also completely free.

  • You can write your entire paper/report (text, code, analysis, graphics, etc.) all in a language called R Markdown.

  • If you need to update any of your code, R Markdown will automatically update your plots and output of your analysis and will create an updated PDF file.

  • No more copy-and-paste!

What is R Markdown?

What is R Markdown?

  • “Literate programming”

  • Embed R code in a Markdown document

  • Renders textual output along with graphics

What is R Markdown?

Bookdown with R Markdown

Bookdown with R Markdown

Bookdown is one of the more recent additions to the R-universe.

Some highlights are:

  • Multiple output formats

  • Focus on writing the content not typesetting

  • Readers can interact with examples

  • Feedback and contributions as the book is developed

  • Integrates with version control

Thesisdown with R Markdown

Thesisdown is built from bookdown

The current output for the four versions is here:

  • PDF
  • Word
  • ePub
  • HTML and Gitbook

Thesisdown with R Markdown - PDF

Thesisdown with R Markdown - Files

Thesisdown with R Markdown - YAML

Blogdown with R Markdown

You can now increase your online voice using tools developed in your research methods and present them as a blog!

  • The R package blogdown allows you to create websites using R Markdown

The website is generated from R Markdown documents

  • all your results
  • analysis
  • graphics

can be computed and rendered dynamically from R code to your website!

Blogdown

Blogdown

What about Git?

What is Git and Github

Git is a version control system that lets you track changes to files over time

  • Git manages the evolution of a set of files – called a repository

Github is a website for storing your git versioned files remotely

  • Github provides a home for your Git-based projects on the internet

  • If you are a student you can get the micro account which includes 5 private repositories for free!

Github

Reproducible Research

“Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.”

Donald Knuth, Literate Programming (1984)

Reproducible Research

“Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them.”

Roger Peng, Johns Hopkins